Executive Summary & Problems that Occurred
3
Problems that Occurred During the Project:
- To stage the transcript data, we also needed to extract the filename to create a VideoPath, which linked it to additional video details.
- There were additional characters (i.e., "[transcript]") at the beginning of each transcript that weren't necessary. We used Derived Column to remove these characters row-by-row.
- We had two tables with redundant purpose: VideoDetails and Transcript tables. We had challenges combining them into one table. We ended up connecting them on column VideoPath/VideoKey, which were the same column but had different naming conventions in different server tables. Additionally, the Data Science team added columns to the transcript source that helped link the aforementioned tables. We were ultimately able to combine the details of VideoDetails and Transcript data into one table: VideoDetails.

Avoiding potential problems when building a data warehouse
https://www.cooladata.com/data-warehouse